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Abstract 

We show that two cooperating robots can learn exactly any strongly-connected directed 
graph with n indistinguishable nodes in expected time polynomial in n. We introduce a new 
type of homing sequence for two robots, which helps the robots recognize certain previously-seen 
nodes. We then present an algorithm in which the robots learn the graph and the homing se- 
quence simultaneously by actively wandering through the graph. Unlike most previous learning 
results using homing sequences, our algorithm does not require a teacher to provide counterex- 
amples. Furthermore, the algorithm can use efficiently any additional information available that 
distinguishes nodes. We also present an algorithm in which the robots learn by taking random 
walks. The rate at which a random walk on a graph converges to the stationary distribution 
is characterized by the conductance of the graph. Our random-walk algorithm learns in ex- 
pected time polynomial in n and in the inverse of the conductance and is more efficient than 
the homing-sequence algorithm for high-conductance graphs. 
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1 Introduction 

Consider a robot trying to construct a street map of an unfamiliar city by driving along the city's 
roads. Since many streets are one-way, the robot may be unable to retrace its steps. However, it 
can learn by using street signs to distinguish intersections. Now suppose that it is nighttime and 
that there are no street signs. The task becomes significantly more challenging. 

In this paper we present a probabilistic polynomial-time algorithm to solve an abstraction of the 
above problem by using two cooperating learners. Instead of learning a city, we learn a strongly- 
connected directed graph G with n nodes. Every node has d outgoing edges labeled from to d— 1. 
Nodes in the graph are indistinguishable, so a robot cannot recognize if it is placed on a node that 
it has previously seen. Moreover, since the graph is directed, a robot is unable to retrace its steps 
while exploring. 

For this model, one might imagine a straightforward learning algorithm with a running time 
polynomial in a specific property of the graph's structure such as cover time or mixing time. Any 
such algorithm could require an exponential number of steps, however, since the cover time and 
mixing time of directed graphs can be exponential in the number of nodes. In this paper, we present 
a probabilistic algorithm for two robots to learn any strongly-connected directed graph in 0(d 2 n 5 ) 
steps with high probability. 

The two robots in our model can recognize when they are at the same node and can communicate 
freely by radio. Radio communication is used only to synchronize actions. In fact, if we assume 
that the two robots move synchronously and share a polynomial-length random string, then no 
communication is necessary. Thus with only minor modifications, our algorithms may be used in a 
distributed setting. 

Our main algorithm runs without prior knowledge of the number of nodes in the graph, n, in 
time polynomial in n. We show that no probabilistic polynomial-time algorithm for a single robot 
with a constant number of pebbles can learn all unlabeled directed graphs when n is unknown. 
Thus, our algorithms demonstrate that two robots are strictly more powerful than one. 

1.1 Related Work 

Previous results showing the power of team learning are plentiful, particularly in the field of induc- 
tive inference (see Smith [Smi94] for an excellent survey). Several team learning papers explore the 
problems of combining the abilities of a number of different learners. Cesa-Bianchi et al. [CBF+93] 
consider the task of learning a probabilistic binary sequence given the predictions of a set of experts 
on the same sequence. They show how to combine the prediction strategies of several experts to 
predict nearly as well as the best of the experts. In a related paper, Kearns and Seung [KS93] 
explore the statistical problems of combining several independent hypotheses to learn a target con- 
cept from a known, restricted concept class. In their model, each hypothesis is learned from a 
different, independently-drawn set of random examples, so the learner can combine the results to 
perform significantly better than any of the hypotheses alone. 

There are also many results on learning unknown graphs, but most previous work has concen- 
trated on learning undirected graphs or graphs with distinguishable nodes. For example, Deng and 



Papadimitriou consider the problem of learning strongly-connected, directed graphs with labeled 
nodes, so that the learner can recognize previously-seen nodes. They provide a learning algorithm 
whose competitive ratio (versus the optimal time to traverse all edges in the graph) is exponen- 
tial in the deficiency of the graph [DP90, Bet92]. Betke, Rivest, and Singh introduce the notion of 
piecemeal learning of undirected graphs with labeled nodes. In piecemeal learning, the learner must 
return to a fixed starting point from time to time during the learning process. Betke, Rivest, and 
Singh provide linear algorithms for learning grid graphs with rectangular obstacles [BRS93], and 
with Awerbuch [ABRS95] extend this work to show nearly-linear algorithms for general graphs. 

Rivest and Schapire [RS87, RS93] explore the problem of learning deterministic finite automata 
whose nodes are not distinguishable except by the observed output. We rely heavily on their results 
in this paper. Their work has been extended by Freund et al. [FK + 93], and by Dean et al. [DA + 92]. 
Freund et al. analyze the problem of learning finite automata with average-case labelings by the 
observed output on a random string, while Dean et al. explore the problem of learning DFAs with 
a robot whose observations of the environment are not always reliable. Ron and Rubinfeld [RR95] 
present algorithms for learning "fallible" DFAs, in which the data is subject to persistent random 
errors. Recently, Ron and Rubinfeld [RR95b] have shown that a teacher is unnecessary for learning 
finite automata with small cover time. 

In our model a single robot is powerless because it is completely unable to distinguish one node 
from any other. However, when equipped with a number of pebbles that can be used to mark 
nodes, the single robot's plight improves. Rabin first proposed the idea of dropping pebbles to 
mark nodes [Rab67]. This suggestion led to a body of work exploring the searching capabilities of a 
finite automaton supplied with pebbles. Blum and Sakoda [BS77] consider the question of whether 
a finite set of finite automata can search a 2 or 3-dimensional obstructed grid. They prove that a 
single automaton with just four pebbles can completely search any 2-dimensional finite maze, and 
that a single automaton with seven pebbles can completely search any 2-dimensional infinite maze. 
They also prove, however, that no collection of finite automata can search every 3-dimensional 
maze. Blum and Kozen [BK78] improve this result to show that a single automaton with 2 pebbles 
can search a finite, 2-dimensional maze. Their results imply that mazes are strictly easier to search 
than planar graphs, since they also show that no single automaton with pebbles can search all 
planar graphs. Savitch [Sav73] introduces the notion of a maze-recognizing automaton (MRA), 
which is a DFA with a finite number of distinguishable pebbles. The mazes in Savitch's paper are 
ra-node 2-regular graphs, and the MRAs have the added ability to jump to the node with the next 
higher or lower number in some ordering. Savitch shows that maze-recognizing automata and log n 
space-bounded Turing machines are equivalent for the problem of recognizing threadable mazes 
(i.e., mazes in which there is a path between a given pair of nodes). 

Most of these papers use pebbles to model memory constraints. For example, suppose that 
the nodes in a graph are labeled with log ra-bit names and that a finite automaton with k log n 
bits of memory is used to search the graph. This situation is modeled by a single robot with k 
distinguishable pebbles. A robot dropping a pebble at a node corresponds to a finite automaton 
storing the name of that node. In our paper, by contrast, we investigate time rather than space 
constraints. Since memory is now relatively cheap but time is often critical, it makes sense to ask 



whether a robot with any reasonable amount of memory can use a constant number of pebbles to 
learn graphs in polynomial time. 

Cook and Rackoff generalized the idea of pebbles to jumping automata for graphs (JAGs) 
[CR80]. A jumping automaton is equipped with pebbles that can be dropped to mark nodes and 
that can "jump" to the locations of other pebbles. Thus, this model is similar to our two-robot 
model in that the second robot may wait at a node for a while (to mark it) and then catch up to 
the other robot later. However, the JAG model is somewhat broader than the two-robot model. 
Cook and Rackoff show upper and lower bounds of log n and log nj log log n on the amount of space 
required to determine whether there is a directed path between two designated nodes in any ra-node 
graph. JAGs have been used primarily to prove space efficiency for si-connectivity algorithms, and 
they have recently resurfaced as a tool for analyzing time and space tradeoffs for graph traversal 
and connectivity problems (e.g. [BB + 90, Poo93, Edm93]). 

Universal traversal sequences have been used to provide upper and lower bounds for the ex- 
ploration of undirected graphs. Certainly, a universal traversal sequence for the class of directed 
graphs could be used to learn individual graphs. However, for arbitrary directed graphs with n 
nodes, a universal traversal sequence must have size exponential in n. Thus, such sequences will 
not provide efficient solutions to our problem. 

1.2 Strategy of the Learning Algorithm 

The power behind the two-robot model lies in the robots' abilities to recognize each other and to 
move independently. Nonetheless, it is not obvious how to harness this power. If the robots separate 
in unknown territory, they could search for each other for an amount of time exponential in the size 
of the graph. Therefore, in any successful strategy for our model the two robots must always know 
how to find each other. One strategy that satisfies this requirement has both robots following the 
same path whenever they are in unmapped territory. They may travel at different speeds, however, 
with one robot scouting ahead and the other lagging behind. We call this a lead-lag strategy. In a 
lead-lag strategy the lagging robot must repeatedly make a difficult choice. The robot can wait at 
a particular node, thus marking it, but the leading robot may not find this marked node again in 
polynomial time. Alternatively, the lagging robot can abandon its current node to catch up with 
the leader, but then it may not know how to return to that node. In spite of these difficulties, our 
algorithms successfully employ a lead-lag strategy. 

Our work also builds on techniques of Rivest and Schapire [RS93]. They present an algorithm 
for a single robot to learn minimal deterministic finite automata. With the help of an equivalence 
oracle, their algorithm learns a homing sequence, which it uses in place of a reset function. It then 
runs several copies of Angluin's algorithm [Ang87] for learning DFAs given a reset. Angluin has 
shown that any algorithm for actively learning DFAs requires an equivalence oracle [Ang81]. 

In this paper, we introduce a new type of homing sequence for two robots. Because of the 
strength of the homing sequence, our algorithm does not require an equivalence oracle. For any 
graph, the expected running time of our algorithm is 0(d 2 n 5 ). In practice, our algorithm can 
use additional information such as indegree, outdegree, or color of nodes to find better homing 
sequences and to run faster. 



Note that throughout the paper, the analyses of the algorithms account for only the number of 
steps that the robots take across edges in the graph. Additional calculations performed between 
moves are not considered, so long as they are known to take time polynomial in n. In practice, 
such calculations would not be a noticeable factor in the running time of our algorithms. 

Two robots can learn specific classes of directed graphs more quickly, such as the class of 
graphs with high conductance. Conductance, a measure of the expansion properties of a graph, 
was introduced by Sinclair and Jerrum [SJ89]. The class of directed graphs with high conductance 
includes graphs with exponentially-large cover time. We present a randomized algorithm that learns 
graphs with conductance greater than n~~ in 0(dn 4 log n) steps with high probability. 

2 Preliminaries 

Let G = (V, E) represent the unknown graph, where G has n nodes, each with outdegree d. An edge 
from node u to node v with label i is denoted (u,i,v). We say that an algorithm learns graph G 
if it outputs a graph isomorphic to G. Our algorithms maintain a graph map which represents the 
subgraph of G learned so far. Included in map is an implicit start node u . It is worth emphasizing 
the difference between the target graph G and the graph map that the learner constructs. The 
graph map is meant to be a map of the underlying environment, G. However, since the robots do 
not always know their exact location in G, in some cases map may contain errors and therefore may 
not be isomorphic to any subgraph of G. Much of the notation in this section is needed to specify 
clearly whether we are referring to a robot's location in the graph G or to its putative location in 
map. 

A node u in map is called unfinished if it has any unexplored outgoing edges. Node u is map- 
reachable from node v if there is a path from v to u containing only edges in map. For robot k, the 
node in map corresponding to &'s location in G if map is correct is denoted Loc M (k). Robot &'s 
location in G is denoted Loc G (k). 

Let / be an automorphism on the nodes of G such that 

Va, b e G, (a, i,b)eG ^ (f(a), i, f(b)) G G. 

We say nodes c and d are equivalent (written c = d) iff there exists such an / where f(c) = d. 

We now present notation to describe the movements of k robots in a graph. An action A, of 
the ith robot is either a label of an outgoing edge to explore, or the symbol r for "rest." A k-robot 
sequence of actions is a sequence of steps denoting the actions of the k robots; each step is a k-tuple 
(A , . . . , A k _i). For sequences s and t of actions, sot denotes the sequence of actions obtained by 
concatenating s and t. 

A path is a sequence of edge labels. Let \path\ represent the length of path. A robot follows 
a path by traversing the edges in the path in order beginning at a particular start node in map. 
The node in map reached by starting at u and following path is denoted final M (path, u ). Let s be 
a two-robot sequence of actions such that if both robots start together at any node in any graph 
and execute s, they follow exactly the same path, although perhaps at different speeds. We call 
such a sequence a lead-lag sequence. Note that if two robots start together and execute a lead-lag 



sequence, they end together. The node in G reached if both robots start at node a in G and follow 
lead-lag sequence s is denoted final G (s, a). 

For convenience, we name our robots Lewis and Clark. Whenever Lewis and Clark execute a 
lead-lag sequence of actions, Lewis leads while Clark lags behind. 

3 Using a Reset to Learn 

Learning a directed graph with indistinguishable nodes is difficult because once both robots have 
left a known portion of the graph, they do not know how to return. This problem would vanish 
if there were a reset function that could transport both robots to a particular start node u . We 
describe an algorithm for two robots to learn directed graphs given a reset. Having a reset is not a 
realistic model, but this algorithm forms the core of later algorithms, which learn without a reset. 

Algorithm Learn-with-Reset maintains the invariant that if a robot starts at u , there is a 
directed path it can follow that visits every node in map at least once. To learn a new edge (one 
not yet in map) using algorithm Learn-with-Reset, Lewis crosses the edge and then Clark tours 
the entire known portion of the map. If they encounter each other, Lewis's position is identified; 
otherwise Lewis is at a new node. The depth-first strategy employed by Learn-Edge is essential 
in later algorithms. In Learn-with-Reset, as in all the procedures in this paper, variables are 
passed by reference and are modified destructively. 

Lemma 1 The variable path in Learn-with-Reset denotes a tour of length < dn 2 that starts at 
u and traverses all edges in map. 



Learn-with-Reset( ): 


1 


map := ({«o}, 0) { map is the graph consisting of node «o and no edges } 


2 


path := empty path { path is the null sequence of edge labels } 


3 


k := 1 { k counts the number of nodes in map } 


4 


while there are unfinished nodes in map 


5 


do Le&ri).-~Edg ) e(map,path,k) 


6 


Reset 


7 


return map 


Learn-Edge( map, path, k): {path = tour through all edges in map } 


1 


Lewis follows path to finalM (pathjiio) 


2 


Ui := some unfinished node in map map-reachable from Locm (Lewis) 


3 


Lewis moves to node «;; append the path taken to path 


4 


pick an unexplored edge / out of node U{ 


5 


Lewis moves along edge /; append edge / to path { Lewis crosses a new edge } 


6 


Clark follows path to finalM(path,uo) { Clark looks for Lewis } 


7 


if 3j < k such that Clark first encountered Lewis at node Uj 


8 


then add edge («;, /, Uj) to map 


9 


else add new node u^ and edge («;, /, u^) to map 


10 


k := k + l 



Proof: Every time Lewis crosses an edge, that edge is appended to path. Since no edge is added to 
map until Lewis has crossed it, path must traverse all edges in map. In each call to Learn-Edge, at 
most n edges are added to path. The body of the while loop is executed dn times, so \path\ < dn 2 . 

a 

Lemma 2 Map is always a subgraph of G . 

Proof: Initially map contains a single node u and no edges. Assume inductively that map is a 
subgraph of G after the cth call to Learn-Edge (when map has c edges). To learn the next edge, 
the algorithm chooses a node m, in map and explores a new edge e = (ui,l,v). By Lemma 1 and 
the inductive hypothesis, if Clark encounters Lewis at Uj then v is identified as Uj. Otherwise v is 
recognized to be a new node and named u k . Therefore the updated map is a subgraph of G. □ 

Lemma 3 If map contains any unfinished nodes, then there is always some unfinished node in 
map map-reachable from final M (p&th,u ). 

Proof: Suppose this assumption were false. Then there is some unfinished node in map, but 
all nodes of map in the strongly-connected component of final M (path,u ) are finished. Thus by 
Lemma 2, there are no additional edges of G leaving that component, so graph G is not strongly 
connected. □ 

Theorem 4 After 0(d 2 n 3 ) moves and dn calls to Reset, Learn-with-Reset halts and outputs a 
graph isomorphic to G. 

Proof: The correctness of the output follows from Lemmas 1-3. For each call to Learn-Edge, 
each robot takes length(path)< dn 2 steps. The algorithm Learn-Edge is executed at most dn 
times, so the algorithm halts within 0(d 2 n 3 ) steps. □ 

4 Homing Sequences 

In practice, robots learning a graph do not have access to a reset function. In this section we 
suggest an alternative technique: we introduce a new type of homing sequence for two robots. 

Intuitively, a homing sequence is a sequence of actions whose observed output uniquely deter- 
mines the final node reached in G. Rivest and Schapire [RS93] show how a single robot with a 
teacher can use homing sequences to learn strongly-connected minimal DFAs. The output at each 
node indicates whether that node is an accepting or rejecting state of the automaton. If the target 
DFA is not minimal, their algorithm learns the minimal encoding of the DFA. In other words, their 
algorithm learns the function that the graph computes rather than the structure of the graph. 

In unlabeled graphs the nodes do not produce output. However, two robots can generate output 
indicating when they meet. 

Definitions: Each step of a two-robot sequence of actions produces an output symbol T if the 
robots are together and S if they are separate. An output sequence is a string in {T, S}* denoting 



the observed output of a sequence of actions. Let s be a lead-lag sequence of actions and let a be 
a node in G. Then output (s, a) denotes the output produced by executing the sequence s, given 
that both robots start at a. A lead-lag sequence s of actions is a two-robot homing sequence iff 
V nodes u,v G G, 

output(s, u) = output(s, v) =>■ final G (s, u) = final G (s, v). 

Because the output of a sequence depends on the positions of both robots, it provides information 
about the underlying structure of the graph. Figure 1 illustrates the definition of a two-robot 
homing sequence. This new type of homing sequence is powerful. Unlike most previous learning 
results using homing sequences, our algorithms do not require a teacher to provide counterexamples. 
In fact, two robots on a graph define a DFA whose states are pairs of nodes in G and whose 
edges correspond to pairs of actions. Since the automata defined in this way form a restricted class 
of DFAs, our results are not inconsistent with Angluin's work [Ang81] showing that a teacher is 
necessary for learning general DFAs. 

Theorem 5 Every strongly-connected directed graph has a two-robot homing sequence. 

Proof: The following algorithm (based on that of Kohavi [Koh78, RS93]) constructs a homing 
sequence: Initially, let h be empty. As long as there are two nodes u and v'mG such that output (h,u) 
= output (h,v) but final (h,u) ^ final (h,v), let a; be a lead-lag sequence whose output distinguishes 
final (h,u) from final (h,v). Since final (h,u) ^ final (h,v) and G is strongly connected, such an x 
always exists. Append x to h. 

Each time a sequence is appended to h, the number of different outputs of h increases by at 
least 1. Since G has n nodes, there are at most n possible output sequences. Therefore, after n — 1 
iterations, h is a homing sequence. □ 

In Section 5 we show that it is possible to find a counterexample x efficiently. Given a strongly- 
connected graph G and a node a in G, a pair of robots can verify whether they are together at a node 
equivalent to a on some graph isomorphic to G. We describe a verification algorithm Verify(a, 
G) in Section 5. The sequence of actions returned by a call of Verify(M, G) is always a suitable 
counterexample x. Using the bound from Corollary 8, we claim that this algorithm produces a 
homing sequence of length 0{n A ) for all graphs. Note that shorter homing sequences exist; the 
homing sequence produced by algorithm Learn-Graph in Section 5 has length 0(dn 3 ). 

4.1 Using a Homing Sequence to Learn 

Given a homing sequence h, an algorithm can learn G by maintaining several running copies of 
Learn-with-Reset. Instead of a single start node, there are as many as n possible start nodes, 
each corresponding to a different output sequence of h. Note that many distinct output sequences 
may be associated with the same final node in G. 

The new algorithm Learn-with-HS maintains several copies of map and path, one for each 
output sequence of h. Thus, graph map c denotes the copy of the map associated with output 
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Figure 1: Illustration of a two-robot homing sequence and a lead-lag sequence. Note that both h 
and s are lead-lag sequences. However, sequence h is a two-robot homing sequence, because for 
each output sequence there is a unique end node. (Note that the converse is not true.) Sequence 
s is not a two-robot homing sequence, because the robots may end at nodes a or c and yet see the 
same output sequence ST. 

sequence c. Initially, Lewis and Clark are at the same node. Whenever algorithm Learn-with- 
Reset would use a reset, Learn-with-HS executes the homing sequence h. If the output of h is 
c, the algorithm learns a new edge in map c as if it had been reset to u in map c (see Figure 2). 
After each execution of h, the algorithm learns a new edge in some map c . Since there are at most 
n copies, each with dn edges to learn, eventually one map will be completed. Recall that a homing 
sequence is a lead-lag sequence. Therefore, at the beginning and end of every homing sequence the 
two robots are together. 

Theorem 6 If Learn-with-HS is called with a homing sequence h as input, it halts within 
0(d 2 n 4 + dn 2 \h\) steps and outputs a graph isomorphic to G. 



Proof: The algorithm Learn-with-HS maintains at most n running versions of Learn-with- 

Reset, one for each output of the homing sequence. In particular, whenever the two robots execute 



Learn-with-HS (/i): 


1 


done := FALSE 


2 


while not done 


3 

4 


do execute h; c := the output sequence produced { instead of a reset } 
if map c is undefined 


5 
6 

7 


then map c := ({«o},0) { map c = graph consisting of node «o and no edges } 
path c := empty path { path c is the null sequence of edge labels } 
k c := 1 { k c counts the number of nodes in map } 


8 
9 


Learn-Edge( map c , path c , k c ) 
if map c has no unfinished nodes 


10 


then done := TRUE 


11 


return map c 



a homing sequence and obtain an output c, they have identified their position as the start node u 
in map c , and can learn one new edge in map c before executing another homing sequence. 

Eventually, one of the versions halts and outputs a complete map c . Therefore, the correctness 
of Learn-with-HS follows directly from Theorem 4 and the definition of a two-robot homing 
sequence. 

Let r = 0(d 2 n 3 ) be the number of steps taken by Learn- with- Reset. Since there are at most 
n start nodes, Learn-with-HS takes at most nr + dn 2 \h\ steps. □ 

5 Learning a Homing Sequence 

Unlike a reset function, a two-robot homing sequence can be learned. The algorithm Learn-Graph 
maintains a candidate homing sequence h and improves h as it learns G. 

Definition: Candidate homing sequence h is called a bad homing sequence if there exist nodes 
u,v, u/ v, such that output(h,u) = output(h,v), but final G (h,u) ^ final G (h,v). 

Definition: Let a be a node in G. We say that map c with start node u is a good representation 
of {a,G) iff there exists an isomorphism / from the nodes in map c = (V C ,E C ) to the nodes in a 
subgraph G' = (V',E') of G, such that f(u ) = a, and 

Vm,^ e v%{u t ,£,u,) eE<^ {f(ui),£j(ui)) e E'. 

In algorithms Learn-with-Reset and Learn-with-HS, the graphs map and map c are always 
good representations of G. In Learn-Graph if the candidate homing sequence h is bad, a particular 
map c may not be a good representation of G. However, the algorithm can test for such maps. 
Whenever a map c is shown to be in error, h is improved and all maps are discarded. By the proof 
of Theorem 5, we know that a candidate homing sequence must be improved at most n — 1 times. 
In Section 5.1 we explain how to use adaptive homing sequences to discard only one map per 
improvement. 

We now define a test that with probability at least 1/ra detects an error in map c if one exists. 
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Figure 2: A possible "snapshot" of the learners' knowledge during an execution of Learn-with- 
HS. The robots are learning the graph G from Figure 1 using the two-robot homing sequence h 
from Figure 1. (Node names in maps are not known to the learner, but are added for clarity.) The 
following example demonstrates how the robots learn a new edge using Learn-with-HS. Suppose 
that the robots execute h and see output TTST. Then the robots are together at node b. Lewis 
follows path 1,0 to unfinished node c and then crosses the edge labeled 1. Now Clark follows path 
1, 0, 1. Since Clark sees Lewis after 2 steps, the dotted edge is added to map TTST . Next, the robots 
execute h again and see output STST. Thus, they go on to learn some edge in mapsTST- 

Definition: Let path c be a path such that a robot starting at u and following path c traverses 
every edge in map c = (V C ,E C ). Let u . . .u m be the nodes in V c numbered in order of their first 
appearance in path c . If both robots are at u then test c (ui) denotes the following lead-lag sequence 
of actions: (1) Lewis follows path c to the first occurrence of m,; (2) Clark follows path c to the first 
occurrence of m,; (3) Lewis follows path c to the end; (4) Clark follows path c to the end. 

Definition: Given map c and any lead-lag sequence t of actions, define expected(t, map c ) to be 
the expected output if map c is correct and if both robots start at node u and execute sequence t. 
We abbreviate expected(test c (v,i),map c ) by expected(test c (uij). 

Lemma 7 Suppose Lewis and Clark are both at some node a in G. Let path c be a path such that 
a robot starting at u and following path c traverses every edge in map c . Then map c is a good 
representation of (a,G) iffVui G V c , output (test c (ui)) = expected( / test c (M J ) y ). 

Proof: 

(=^): By definition of good representation and expected (test c (ui)). 

(^=): Suppose that all tests produce the expected output. We define a function / as follows: Let 

f(u ) = a. Let p{ui) be the prefix of path c up to the first occurrence of m,. Define /(«;) to be the 



10 



{ instead of a reset } 



Learn- Graph(): 

1 done := FALSE 

2 h := A (empty sequence) 

3 while not done 

4 do execute h; c := the output sequence produced 

5 if map c is undefined 

6 then map c := ({«o}, 0) { map c is the graph consisting of node «o and no edges } 

7 path c := empty path { path c is the null sequence of edge labels } 

8 k c := 1 { k c counts the number of nodes in map } 

9 if map c has no unfinished node map-reachable from finalM (path c ) 

10 then Lewis and Clark move to finalM(path c ) 

11 comp := maximal strongly-connected component in map c containing finalM(path c ) 

12 h-improve := Verify (finalM(path c ), comp) 

13 if h-improve = A 

14 then done := TRUE 

15 else 

16 append h-improve to end of h 

17 discard all maps and paths { 

18 else v := value of a fair 0/1 coin flip 

19 ifv = 

20 then U{ := a random node in map c 

21 h-improve := Test(map c , path c , i) 

22 if h-improve ^ A 

23 then append h-improve to end of h 

24 discard all maps and paths { . . 

25 else Le&ri).-~Edg ) e(map c ,path c ,k c ) 

26 return map c 



{ map c is complete } 

{ h-improve ^ A. error detected } 

{ improve homing sequence . . . } 

. and start learning maps from scratch } 

{ learn edges or test for errors? } 

{ test for errors } 

{ randomly pick node to test } 



{ error detected } 

{ improve homing sequence . . . } 

. start learning maps from scratch } 



Test(map c , path c , i): {«o,wi, 



.u k 



the nodes in map c indexed by first appearance in path c } 



1 h-improve := the following sequence of actions: 

2 Lewis follows path c to the first occurrence of U{ in path c 

3 Clark follows path c to the first occurrence of U{ in path c 

4 Lewis follows path c to the end 

5 Clark follows path c to the end 

6 if output (h-improve) ^ expected- output (h-improve) 

7 then return h-improve 

8 else return A 



{ if error detected } 

{ return test c (ui) } 

{ return empty sequence } 



Verify (i>o, mc 



{ v , 



«i, 



Vk are the nodes in map ordered by first appearance in p) 



1 path := path such that a robot starting at i>o in map and following path visits all nodes 
in map and returns to i>o 

2 for each i,0 < i < k 

3 do h-improve := Test (map, path, i) 

4 if h-improve ^ A 

5 then return h-improve 

6 return A 
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node in G that a robot reaches if it starts at a and follows p{ui). Let G' = (V',E r ) be the image 
of f(map c ) on (V, E). 

We first show that / is an isomorphism from V c to V . By definition of G', f must be surjective. 
To see that / is injective, assume the contrary. Then there exist two nodes u,,Uj G V c such that 
i zfz j but f(ui) = f(uj). But then output (test c (ui)) ^ expected (test c (ui)), which contradicts our 
assumption that all tests succeed. Next, we show that (ui,£,Uj) G V c -<=>■ (f(ui),£,f(uj)) G V, 
proving that map c is a good representation of (a,G). 

(^=): By definition of G', the image of map c . 

(=$■)■ Inductively assume that (ui,£,Uj) G V c -<=>■ (f(ui),£,f(uj)) G V for the first m edges 
in path c , and suppose that this prefix of the path visits only nodes u . . .u,. Now consider the 
(ra+ l)st edge e = (a,£,b). There are two possibilities. In one case, edge e leads to some new node 
Wj +1 . Then by definition f(u i+ i) is 6's image in G, so (f(a),£,f(b)) G G'. Otherwise e leads to 
some previously-seen node u i _- k . Suppose that j{u i _- k ) is not the node reached in G by starting at 
u and following the first m + 1 edges in path c . Then output(test c (ui_ k )) ^ expected (test c (u i _- k )), 
so test c (ui_ k ) fails, and we arrive at a contradiction. Therefore f(b) = f(ui_ k ) and (f(a),£,f(b)) 
G G'. □ 

Corollary 8 Suppose Lewis and Clark are together at u in map c . Let map c be strongly connected 
and have n nodes, u , . . .u n _i. Then the two robots can verify whether map c is a good represen- 
tation of (Loc G (Lewis),G) in 0(n 3 ) steps. 

Proof: Since map c is strongly connected, there exists a path path c with the following property: 
a robot starting at u and following path c visits all nodes in map c and returns to u . Index the 
remaining nodes in order of their first appearance in path c . The two robots verify whether, for all 
Ui in order, output (test c (ui)) = expected (test c (ui)). Note that Lewis and Clark are together at u 
after each test. By Lemma 7, this procedure verifies map c . Since path c has length 0(ra 2 ), each test 
has length 0(ra 2 ), so verification requires 0(n 3 ) steps. □ 

In Learn-Graph after the robots execute a homing sequence, they randomly decide either to 
learn a new edge or to test a random node in map c . The following lemma shows that a test that 
failed can be used to improve the homing sequence. 

Lemma 9 Let h be a candidate homing sequence in Learn-Graph, and let u k be a node such 
that output(test c (tifc)) j^ expected(test c (M fc )). Then there are two nodes a,b in G that h does not 
distinguish but that ho test c (u k ) does. 

Proof: Let a be a node in G such that when both robots start at a, output (test c (u k )) ^ 
expected (test c (u k )). Suppose that at step i in test c (u k ), the expected output is T (respectively 
S), but the actual output is S (resp. T). Each edge in path c and map c was learned using Learn- 
Edge. If map c indicates that the itli node in path c is u k , there must be a start node b in G where 
u k really is the ith node in path c . Since output (test c {u k )) ^ expected (test c (u k )), the sequence ho 
test c (u k ) distinguishes a from b. □ 
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The algorithm Learn-Graph runs until there are no more map-reachable unexplored nodes in 
some map c . If map c is not strongly connected, then it is not a good representation of G. In this 
case, the representation of the last strongly-connected component on path must be incorrect. Thus, 
calling Verify on this component from the last node on path returns a sequence that improves h. 
If map c is strongly connected, then either Verify returns an improvement to the homing sequence, 
or map c is a good representation of G. 

Before we can prove the correctness of our algorithm, we need one more set of tools. Consider 
the following statement of Chernoff bounds from Raghavan [Rag89]. 

Lemma 10 Let Xi, . . .,X m be independent Bernoulli trials with E[Xj] = Pj. Let the random 
variable X = YlT=i Xj, where fj, = E[X] > 0. Then for S > 0, 



Pr[X >(l + 6) f j]< 



e s 



(i + sy+ s 



and 



Pr[X <(l-6) f j]< e 



-^S 2 /2 



In our analysis in this section and in Section 6, the random variables may not be independent. 
However, the following corollary bounds the conditional probabilities. The proof of this corollary 
is exactly analogous to that of a similar corollary by Aumann and Rabin [AR94, Corollary 1]. 

Corollary 11 Let Xi, . . .,X m be 0/1 random variables (not necessarily independent), and let bj G 
{0, 1} for 1 < j ' < m. Let the random variable X = YlT=i -^-j ■ F° r an V ^i> • • • ■> fy-i an d <*> > 0, if 
Pr[Xj = l|Xi = bi,X 2 = b 2 , . . .,Xj_i = 6 J _ 1 ] < pj and fj, = YlT=iPj > 0' then 



Pr[X >(l + 6)n]< 



e s 



(1 + SY+ S 



and for any 6 1 ,...,6 J _ 1 and S > 0, if Pr[Xj = l|Xi = &i,X 2 = b 2 , ■ ■ ■ ,Xj_i = 6 J _ 1 ] > pj and 

I 1 = J2f=iPj > °7 then 

Pr[X <(l-6)fj]< e"^ 2/2 . 

Theorem 12 The algorithm Learn-Graph always outputs a map isomorphic to G and halts in 
0(d 2 n 6 ) steps with overwhelming probability (1 — e~ cn , where constant c > can be chosen as 
needed). 

Proof: Since Learn-Graph verifies map c before finishing, if the algorithm terminates then by 
Corollary 8 it outputs a map isomorphic to G. It is therefore only necessary to show that the 
algorithm runs in 0(d 2 n 6 ) steps with overwhelming probability. 

In each iteration of the while loop in Learn-Graph, if there are no map-reachable unfinished 
nodes, then the algorithm attempts to verify the map. Otherwise, the algorithm decides randomly 
whether to learn a new edge or to test a random node in the graph. It follows from Lemma 10 that 
a constant fraction of the random decisions are for learning and a constant fraction are for testing. 
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By Theorem 4 the total number of steps spent learning edges in each version of map is 0(d 2 n 3 ). 
For each candidate homing sequence, there are n versions of map, and the candidate homing 
sequence is improved at most n times. Thus, 0(d 2 n 5 ) steps are spent learning nodes and edges. 

We consider the number of steps taken testing nodes. Each test requires \path\ = 0(dn 2 ) steps. 
Once a map contains an error, the probability that the robots choose to test a node that is in error 
is at least 1/n. A map with more than dn edges must be faulty. Note that the candidate homing 
sequence is improved at most n — 1 times. Thus by Corollary 11, with overwhelming probability 
after 0{n 2 ) tests of maps with at least dn nodes, the candidate sequence h is a homing sequence. 
Overall, the algorithm has to learn 0{dn 3 ) edges, and therefore it executes 0(dn 3 ) tests. Thus the 
total number of steps spent testing is 0(d 2 n 5 ). 

After each test or verification, the algorithm executes a candidate homing sequence. Since 
there are O(dn) edges in each map, candidate homing sequences are executed 0(dn 3 ) times. Each 
improvement of the candidate homing sequence extends its length by \path\, so the time spent 
executing homing sequences is 0(d 2 n 6 ). Thus, the total running time of the algorithm is 0(d 2 n 6 ). 
a 

5.1 Improvements to the Algorithm 

The running time for Learn-Graph can be decreased significantly by using two-robot adaptive 
homing sequences. As in Rivest and Schapire [RS93], an adaptive homing sequence is a decision 
tree, so the actions in later steps of the sequence depend on the output of earlier steps. With an 
adaptive homing sequence, only one map c needs to be discarded each time the homing sequence is 
improved. Thus the running time of Learn-Graph decreases by a factor of n to 0(d 2 n 5 ). 

Any additional information that distinguishes nodes can be included in the output, so homing 
sequences can be shortened even more. For example, a robot learning an unfamiliar city could 
easily count the number of roads leading into and out of intersections. It might also recognize 
stop signs, traffic lights, railroad tracks, or other common landmarks. Therefore, in any practical 
application of this algorithm we expect a significantly lower running time than the 0(d 2 n 5 ) bound 
suggests. 

Graphs with high conductance can be learned even faster using the algorithm presented in 
Section 6. 

5.2 Limitations of a Single Robot with Pebbles 

We now compare the computational power of two robots to that of one robot with a constant 
number of pebbles. Note that although Learn-Graph runs in time polynomial in n, the algorithm 
requires no prior knowledge of n. We argue here that a single robot with a constant number of 
pebbles cannot efficiently learn strongly-connected directed graphs without prior knowledge of n. 
As a tool we introduce a family C = U n C n of graphs called combination locks. 1 For a graph 



Graphs of this sort have been used in theoretical computer science for many years (see [Moo56], for example). 
More recently they have reemerged as tools to prove the hardness of learning problems. We are not sure who first 
coined the term "combination lock." 
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C = (V,E) in C n (the class of ra-node combination locks), V = {u , Ui, . . . , u n _{\ and either 
{ui,0,u i+lmodn ) and {ui,l,u ) G E, or {ui,l,u i+lmodn ) and (m;,0,m ) G E, for all i < n (see 
Figure 3a). In order for a robot to "pick a lock" in C n — that is, to reach node u n _i — it must 
follow the unique ra-node simple path from u to u n _i. Thus any algorithm for a single robot with 
no pebbles can expect to take 0(2 n ) steps to pick a random combination lock in C n . 




Figure 3: (a) A combination-lock, whose combination is 0, 1, 0, 1, 1. (b) A graph in 1Zn. Graphs in 
1Z = U^ =1 TZ n cannot be learned by one robot with a constant number of pebbles. 

We construct a restricted family 1Z of graphs and consider algorithms for a single robot with a 
single pebble. For all positive integers n, the class lZ n contains all graphs consisting of a directed 
ring of ra/2 nodes with an ra/2-node combination lock inserted into the ring (as in Figure 3b). Let 
1Z = U^° =1 7^ n . We claim that there is no probabilistic algorithm for one robot and one pebble that 
learns arbitrary graphs in 1Z in polynomial time with high probability. 

To see the claim, consider a single robot in node u of a random graph in 1Z. Until the robot 
drops its pebble for the first time it has no information about the graph. Furthermore, with 
high probability the robot needs to take 0(2 n ) steps to emerge from a randomly- chosen ra-node 
combination lock unless it drops a pebble in the lock. But since the size of the graph is unknown, 
the robot always risks dropping the pebble before entering the lock. If the pebble is dropped outside 
the lock, the robot will not see the pebble again until it has passed through the lock. A robot that 
cannot find its pebble has no way of marking nodes and cannot learn. 

More formally, suppose that there were some probabilistic algorithm for one robot and a pebble 
to learn random graphs in 1Z in polynomial time with probability greater than 1/2. Then there 
must be some constant c such that the probability that the robot drops its pebble in its first c steps 
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is greater than 1/2. (Otherwise, the probability that the algorithm fails to learn in time polynomial 
in n is greater than 1/2.) Therefore, the probability that the robot loses its pebble and fails to 
learn a random graph in TZ 2c efficiently is at least 1/2. 

A similar argument holds for a robot with a constant number of pebbles. We conjecture that 
even if the algorithm is given n as input, a single robot with a constant number of pebbles cannot 
learn strongly-connected directed graphs. However, using techniques similar to those in Section 6, 
one robot with a constant number of pebbles and prior knowledge of n can learn high-conductance 
graphs in polynomial time with high probability. 

6 Learning High Conductance Graphs 

For graphs with good expansion, learning by walking randomly is more efficient than learning by 
using homing sequences. In this section we define conductance and present an algorithm that runs 
more quickly than Learn-Graph for graphs with conductance greater than ^log nj ' dn 2 . 

6.1 Conductance 

The conductance [SJ89] of a graph characterizes the rate at which a random walk on the graph 
converges to the stationary distribution it. For a given directed graph G = (V,E), consider a 
weighted graph G' = (V,E,W) with the same vertices and edges as G, but with edge weights 
defined as follows. Let M = {niij} be the transition matrix of a random walk that leaves i by 
each outgoing edge with probability 1/(2 • degree(i)) and remains at node i with probability 1/2. 
Let P° be an initial distribution on the n nodes in G, and let P* = P° ' M t be the distribution after 
t steps of the walk defined by M. (Note that it is a steady state distribution if for every node 
i,P- = 7Tj — > Pt +l = 7T» - For irreducible and aperiodic Markov chains, ir exists and is unique.) 
Then the edge weight Wij = ■Kiiriij is proportional to the steady state probability of traversing the 
edge from i to j. Note that the total weight entering a node is equal to the total weight leaving it; 
that is, Y.j w i,j = Ej w j,i- 

Consider a set S C V which defines a cut (S, S). For sets of nodes S and T, let Ws,t = 
J2s£St£T w s,t- We denote Wsy by Ws, so Wy represents the total weight of all edges in the graph. 
Then the conductance of S is defined as (f> s = W s -g/ J2ies w i = l^ss/^S' 

The conductance of a graph is the least conductance over all cuts whose total weight is at most 
Wy/2: (f>(G) = mins {max (<^>,s,<^j)} . The conductance of a directed graph can be exponentially 
small. 

Mihail [Mih89] shows that after a walk of length ^> _2 log(2n/e 2 ), the L\ norm of the distance 
between the current distribution P and the stationary distribution it is at most e (i.e. J2i \Pi ~ 7r i\ ^ 
e). In the rest of this section, a choice of e = 1/ra 2 is sufficient, so a random walk of length 
<^ _2 log(2ra 5 ) is used to approximate the stationary distribution. We call T = cj)~ 2 log(2n 5 ) the 
approximate mixing time of a random walk on an ra-node graph with conductance (f>. 
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6.2 An Algorithm for High Conductance Graphs 

If a graph has high conductance it can be learned more quickly. In a high-conductance graph, we 
can estimate the steady state probability of node i by leaving Clark at node i while Lewis takes w 
random walks of ^> _2 log(2n 5 ) steps each. Let x be the number of times that Lewis sees Clark at 
the last step of a walk. If w is large enough, x/w is a good approximation to 7Tj. 

Definitions: Call a node i a likely node if 7r 4 - > l/2ra + 1/ra 2 . Note that every graph must have at 
least one such node. (The 1/ra 2 term appears because of the distance e = 1/ra 2 from the stationary 
distribution. Its inclusion here simplifies the analysis later.) A node that is not likely is called 
unlikely. 

Algorithm Learn-Graph2 uses this estimation technique to find a likely node u and then calls 
the procedure Build-Map to construct a map of G starting from u . The procedure Build-Map 
learns at least one new edge each iteration by sending Lewis across an unexplored edge (u,£,v) of 
some unfinished node u in map. Clark waits at start node u while Lewis walks randomly until 
he meets Clark. (If u is a likely node, this walk is expected to take O(Tn) steps.) Lewis stores 
this random walk in the variable path. Thus, pathi is the label of the edge traversed at the ith step 
of the random walk, path[i . . .j] represents edges path i to path,, and \path\ represents the length of 
path. We say that path-step(Lewis) = i if Lewis has just crossed the ith edge on path. 



Learn- Graph2(w, B, M, T): 


1 


done := FALSE 


2 


T := <f)~ 2 log (2n 5 ) { the mixing time } 


3 


while (not (done)) 


4 


do map := ({«o}, 0) { map is the graph consisting of node «o and no edges } 


5 


lost := FALSE 


6 


Lewis and Clark together take a random walk of length T 


7 


Lewis takes w random walks of length T { approx. stationary prob. of Locq (Clark) } 


8 


x := number of walks where Lewis and Clark are together at the last step 


9 


path := the path Lewis followed since leaving Clark 


10 


if x/w < B { bound B < l/n chosen for ease of proof } 


11 


then Clark follows path to catch up to Lewis { not at a frequently-visited node } 


12 


else Lewis moves randomly until he sees Clark { call node where they meet «o } 


13 


done := Build-Map(map,M, T) 


14 


return map 



The procedure Compress-Path returns the shortest subpath of path that connects v to u . 
Finally, Truncate-Path-at-Map compares nodes on the path with all nodes in map and returns 
the shortest subpath connecting v to some node in map. By adding the final path to the map, 
Build-Map connects the new node v to map, so map always represents a strongly connected 
subgraph of G. Figure 4 illustrates a single iteration of the main loop in Build-Map. 

Algorithm Learn-Graph2 takes as input parameters the number of random walks w, a bound 
B to separate the probability of likely and unlikely nodes (we choose B to be approximately 3/4ra), 
the mixing time T , and a quantity M. This quantity is chosen so that the probability of a robot's 
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Build-Map(map,M, T): 

1 while there are unfinished nodes in map and not lost { Lewis and Clark both at «o } 

2 do Ui := an unfinished node in map 

3 restart := a minimal path in map from «o to U{ 

4 m := largest index of the nodes in map 

5 path := empty path 

6 Lewis follows restart and traverses unexplored edge £ { Lewis crosses a new edge } 

7 while length(path) < MT and robots are not together { Lewis walks randomly . . . } 

8 do Lewis traverses a random edge £' and adds £' to end of path 

9 if robots are together { • • • until he sees Clark at «o } 

10 then both robots follow restart to U{ and cross edge £ 

11 path := Compress-Path(pa</j, restart) { removes loops from path } 

12 path, Uj := Truncate-Path- At-Map(pa</», map) { shortest path back to map } 

13 if \path\ = 

14 then add edge (iii,£,Uj) to map 

15 else add nodes M m +i, . . . , u m+ \ V ath\ ^° ma ^ 

16 add edges (ui,£,u m+ i) and (u m+ \p a th\> P aih \path\' u j) to ma ^ 

17 Vfc, 1 <k< \path\ add edges (u m+ k,path k ,u m+ k + i) 

18 both robots move to «o 

19 else { if Lewis walks MT steps without seeing Clark } 

20 Clark follows restart, £, path to catch up to Lewis 

21 lost := TRUE 

22 if lost 

23 then return FALSE 

24 else return TRUE 



Compress-Path (path, restart): 

1 while Clark not at end of path { Lewis and Clark both at «o } 

2 do while Lewis not at end of path 

3 do Lewis traverses the next edge of path 

4 if Lewis and Clark are together { found a loop in path — remove it } 

5 then path := path[l . . .path-step(Clark) o path[path-step(Lewis) + I . . .\path\] 

6 Lewis follows restart and edge £ 

7 Lewis traverses edges path[l . . . path-step(Clark) ] 

8 both robots are now together and traverse one edge of path 

9 return path { Lewis and Clark both at uq } 



Truncate-Path- At-~M.&p(path,map ): 

1 earliest := \path\ { first position on path that is a node already in map } 

2 earliest-node := «o { the name of this node } 

3 for each node u^ in map 

4 Clark moves to u^ 

5 Lewis follows restart and edge £ 

6 while Lewis not at end of path 

7 do if Lewis and Clark are together and path-step(Lewis) < earliest 

8 then earliest := path-step(Lewis) 

9 earliest-node := u^ 

10 Lewis traverses next edge on path 

11 both robots move to «o 

12 return path [1 . . .earliest ], earliest-node 




Figure 4: Procedure Build Map during one execution of the while loop. The ovals represent map, 
the portion of graph G learned so far. Note that map is strongly connected. Node u , the first 
node added to map, is with high probability a node with a large stationary probability (a likely 
node). The robots find u in procedure Learn-Graph2 using random walks in line 2 of Build 
Map. The robots agree on a node m, with unexplored outgoing edges (an unfinished node). Then 
Lewis moves to m, and follows the unexplored edge £, while Clark stays at u . Since £ is unexplored, 
Lewis is now at an unknown node. Lewis walks randomly until, visiting u , he finds Clark. The 
dotted line of Figure 4 [a] depicts this random walk, denoted path. Random walk path may pass 
through the same node many times. In procedure Compress-Path, the robots collectively remove 
all of the loops from the path (reducing the path to the solid line in [a] and [b]). In procedure 
Truncate-Path-at-Map, the robots find Uj, the first node in path already in map. All the nodes 
and edges of path until Uj (the bold line in [b]) are added to map. 
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starting at a likely node and walking randomly for MT steps without returning to its start node is 
very small. 

In sections 6.3 and 6.4 we prove the following theorems. 

Theorem 13 When Learn-Graph2 halts, it outputs a graph isomorphic to G. 



Theorem 14 Suppose Learn-Graph2 is run on a graph G with w = \/(4 + r)dn 3 and M = 
(4 + T)dn 2 for some constant r > 0, and B = J7 (l + f) • Then for sufficiently large n, with 
probability at least 1 — e Learn-Graph2 halts within 0((4 + r)dn 3 T) steps, where e = e~ =° \A 4+r ) nd -|- 

e -V( 4+r ) nd + e -^. 

6.3 Correctness of Learn-Graph2 

In this section, we prove the correctness of each procedure in Learn-Graph2. 

Lemma 15 The procedure Compress-Path halts in 0(n |path|) steps and returns a path in which 
no node occurs more than once. 

Proof: We prove the following invariant by induction: in Compress-Path, whenever Lewis 
reaches the end of path, each node in path[l . . . path-step( Clark)] appears at most once in the entire 
path. 

Assume that this claim holds after Clark has crossed the first k edges in path. By the inductive 
hypothesis, we know that when Clark crosses the (k + l)st edge, he arrives at some new node not 
previously encountered along path. Now Lewis follows the entire path. Whenever the path loops 
back to Loc G (Clark), the loop is removed from the path. Thus, all repeated occurrences of the new 
node are removed from the path, proving the inductive step. 

Since there are n nodes in the graph, Clark can only make n moves before he returns to u . Lewis 
can move at most \path\ steps for every move of Clark's, so the total running time is 0(n \path\). □ 

Lemma 16 The procedure Truncate-Path- At-Map finds the index of the first path step leading 
to a node already in map. The algorithm runs in 0(n 2 ) steps. 

Proof: For each node u k in map, Lewis traverses the path once while Clark waits at u k . The pro- 
cedure keeps track of the earliest node found that is already in map, so the procedure's correctness 
follows. Clark takes at most n steps to reach each node u k . Lewis needs at most n steps to follow 
the compressed path and n more to return to the start of the path. Thus the algorithm requires no 
more than 3ra 2 steps. □ 

The algorithm Learn-Graph2 halts only when Build-Map returns TRUE. The following 
lemma shows that whenever Build-Map returns TRUE, map is isomorphic to G. 

Lemma 17 In Build-Map, whenever Clark is at u , map is a good representation of (Loc G ( Clark), G) . 
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Proof: We inductively build a subgraph G' = (V, E 1 ) of G = (V, E) and construct an isomorphism 
/ from map to G' . Initially, Lewis and Clark are both at u and map consists of the single node 
u and no edges. Let V = £oc G ( Clark), _£' = 0, and f(u ) = £oc G ( Clark). Then map is a good 
representation of (£oc G (Clark),G). 

Consider the robots starting an iteration of the first while loop in Build-Map. Both robots 
are together at u . Inductively assume that map is a good representation of (£oc G (Clark),G) and 
that map is strongly-connected. Thus, Lewis can always reach an unfinished node in map if one 
exists. Lewis walks to an unfinished node m,, crosses a new edge £ to an unknown node v, and 
then walks randomly until he returns to u , where Clark is waiting. From Lemmas 15 and 16, after 
the algorithm executes subroutines Compress-Path and Truncate-Path, £ o path is a path that 
begins at m,, crosses edge £, ends at Uj, and whose intermediate nodes are not represented in map. 

If path is empty after Truncate-Path- At-Map, then v is node Uj already in map. Adding 
edge (f(ui),£,f(uj)) to E' and (ui,£,Uj) to map and therefore maintains the invariant that G' is a 
subgraph of G and preserves the isomorphism between map and G' . 

If path is not empty, then by lemmas 15 and 16 all nodes reached by starting at m, and following 
path to the end are distinct, and only the last node reached is already in map. Let m be the highest 
index so far of any node in map. The algorithm adds new nodes w m+1 , . . . , u m+ \n a th\ an< ^ new e( iges 
{u m+k ,path k ,u m+k+1 } \fk,l < k < \path\, {ui,£,u m+1 }, and {u m+lpat f ll ,path lpathl ,u j } to map. Let 
f(u m+k ) be the location of Lewis in G after Lewis has crossed the kill edge of the path. Add the 
\path\ — 1 new nodes to V, and edges (f(u m+k ),path k , f(u m+k+ i)) to E' . Then / is an isomorphism 
from map to G' C G, so map is a good representation of (£oc G (Clark),G). Since path connects an 
unfinished node to another node in map, map remains strongly-connected. □ 

When Build-Map halts and returns TRUE, there are no unfinished nodes in map . Since 
map is isomorphic to G' C G and has no unfinished nodes, map must have the same number 
of nodes and edges as G. Therefore, map is isomorphic to G when Build-Map returns TRUE, 
proving Theorem 13. □ 

6.4 Running Time and Failure Probability of Learn-Graph2 

We proved that when the algorithm terminates it is correct. In this section, we prove Theorem 14 
by analyzing the probability that the algorithm terminates in a reasonable amount of time. We 
say the algorithm fails if any of the following cases holds: 

1. Algorithm Learn-Graph2 finds an unlikely node but estimates that it is a likely node. 

2. Algorithm Learn-Graph2 fails to find a likely node in the allotted time. We allow w = 
\/(4 + T)dn 3 iterations, each consisting of w random walks. 

3. Algorithm Learn-Graph2 calls Build-Map from a likely node, but Build-Map returns 
FALSE. 

In fact, these conditions overestimate the probability that the algorithm fails to run in 0((4 + 
T)dn 3 T) steps. The next three lemmas bound the probabilities of each of the three failure conditions. 
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At the end of the section, we analyze the running time of Learn-Graph2 when no failure condition 
occurs. 



Lemma 18 (Failure Condition 1) Suppose that Learn-Graph2 is run with w = y/(4 + t)(Iti 3 
and M = (4 + T)dn 2 . Assume that the algorithm estimates node u's steady-state probability to be 
greater than B = ^(1 + f)- Then the probability that u is not a likely node is at most e~ =° V ( 4 + r W_ 



Proof: Call each random walk in Learn-Graph2 a phase. Let X, be the random variable where 
X;-- 



1 if Lewis and Clark are together at the end of phase i 
otherwise. 



Then X = XT=i Xi is the number of phases where Lewis and Clark end together. If u is an unlikely 
node, then E[X] < (w/2n)(l + (2/ra)), because the estimation of ir u could be inaccurate by at most 
e = 1/ra 2 . We therefore bound the quantity 



Pr 



An \ n 



™<-^\ 



Using the Chernoff bound from Lemma 11 with 8 = 1/2, we get: 



Pr 



An V n 
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1 

e 2 


[(f) f J 
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^/(4 + T)dn 



8e \ . ln (r^\ "■",■' -VC+-)'"' i n (JsE) , -VJi+^Tn 
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Lemma 19 (Failure Condition 2) The probability that Learn-Graph2 fails to find and recog- 
nize a likely node after \/(4 + r)dn 3 iterations is at most e~~v^ 4+T ' dn for sufficiently large n. 

Proof: Define a good node to be a node with steady state probability at least 1/ra. (Note that 
every graph has at least one good node.) We can bound the probability that we fail to find and 
recognize a likely node by the probability that we fail to identify a good node within w iterations. 
First, we bound the probability that the algorithm fails to recognize a good node when testing 
one. Random variables X, and X are defined as in Lemma 18. Then, since the stationary probability 
of a good node is greater than 1/ra, 

E(X)>w( l --±- 2 
\n n z 



To simplify the math, note that for sufficiently large n, 



w 



1 



n n 



2 I ~ 16n 



22 



Then by the Chernoff corollary in Lemma 11, for 8 = 1/5, 

3w~ 



Pr 



X<(1-8) 15W 



16ra 
Define 7 to be the quantity 



Pr 



1 



X < , 

An 



iaij£ - -iW( 4 + T ) dn 



< e 2 16 25 „ < g 



1 - - ) (1 - e -TfoV( 4+r ) dn3 ) . 

The probability that the node reached at the end of a random walk of both robots is a good node 
and is identified as such is at least 

I _ _L ) (1 _ e -&V&w^) = 1. 

n n 2 J \ ' n 

Therefore the probability that after w trials, no likely node is found and recognized is at most 



1-1) = (l_l) =e -7V( 4 +^) d «. 

n) V ra/ 

Note that 7 approaches 1 as n increases. For sufficiently large n,j > 1/2, so the probability 
that the robots fail to find a likely node after w trials is at most e~2\A 4+r ) dn . □ 

Now we analyze the running time of Build-Map. The procedure Build-Map executes the 
main while loop at most once for each of the dn edges in the graph. Let k, be the length of the 
random walk in the ith iteration of the while loop. Then let K = J2i2i &i be the total length of all 
the random walks in the algorithm. Since by Lemmas 15 and 16 Compress-Path runs in O(nki) 
steps and Truncate-Path- At-Map runs in 0(n 2 ) steps, the ith iteration takes 0(n + k t ,-\-nk l ,-\-n r ) 
steps. Therefore the total running time of the algorithm is 0(dn 3 + nK). 

Lemma 20 (Failure Condition 3) If u is a likely node, then with probability at least 1 — e - ^, 
K < (4 + T)dn 2 T. 

Proof: We use an amortized analysis to prove the bound on K . First we subdivide all of Lewis' 
random walks during Build-Map into periods of T = (j)~ 2 log 2ra/e 2 steps each, where T is the 
approximate mixing time. Recall that if Lewis starts from any node, after T steps Lewis is at node 
u k with probability between ir k — 1/n 2 and ir k + 1/ra 2 . Thus, Lewis' position after the kth. period 
is almost independent of Lewis' position after the (k + l)st period. 

We associate a 0/1-valued random variable, X k , with the kth. period of Lewis' random walk. 

{1 if Lewis is at node u at the end of the kth. period 
otherwise. 

Since u is a likely node, X k = 1 with probability at least l/2ra. Let X = J2 k =i " -^"fc be the 
number of times Lewis returns to u in (A-\-T)dn 2 periods. Note that E(X) is at least (4 + r)dn/2. 
Using Chernoff bounds with 8=1 — (2/(4 + r)) we find: 

Pr[X < (1 - 6)E[X]] < Pr[X < dn] < e^V" ( l ~^f 
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< e 4 V i+r + (4+r) 2 y l < e j + dn < g^^_ 

n 
Thus, with high probability Build-Map runs in 0(dn 3 + dn 3 (4 + t)T) steps. Suppose none 
of the failure conditions occurs in a run of Learn-Graph2. Then the execution never calls 
Build-Map on an unlikely node, does call Build-Map on a likely node, and when it does, 
Build-Map returns TRUE. Therefore Learn-Graph2 makes at most w steady-state probabil- 
ity estimates, each taking 0(wT) steps, before calling Build-Map once. Therefore, the running 
time is 0((4 + T)dn 3 T), proving Theorem 14. □ 

6.5 Exploring Without Prior Knowledge 

Prior knowledge of n is used in two ways in Learn-Graph2: to estimate the stationary probability 
of a node and to compute the mixing time T. If T is known but n is not, the algorithm can forego 
estimating 7r 4 - entirely and simply run Build-Map after step 6. The removal of lines 7-12 from 
Learn-Graph2 yields a new algorithm whose expected running time is polynomially slower than 
the original. If we know neither n nor T, we can run this new algorithm using standard doubling to 
estimate the quantity MT. This quantity can be used in line 6 of Learn-Graph2 and also in line 
7 of Build-Map as an upper bound on the length of the random walks. Thus no prior knowledge 
of the graph is necessary. 

7 Conclusions and Open Problems 

Note that with high probability, a single robot with a pebble can simulate algorithm Learn- 
Graph2 with a substantial but polynomial slowdown. However, Learn-Graph2 does not run in 
polynomial expected time on graphs with exponentially-small conductance. An open problem is 
to establish tight bounds on the running time of an algorithm that uses one robot and a constant 
number of pebbles to learn an ra-node graph G. We conjecture that the lower bound will be a 
function of (f>(G), but there may be other graph characteristics (e.g., cover time) which yield better 
bounds. It would also be interesting to establish tight bounds on the number of pebbles a single 
robot needs to learn graphs in polynomial time. 

Another direction for future work is to find other special classes of graphs that two robots can 
learn substantially more quickly than general directed graphs, and to find efficient algorithms for 
these cases. 
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